Sublexical Translations for Low-Resource Language

نویسندگان

  • Khan Md. Anwarus Salam
  • Setsuo Yamada
  • Tetsuro Nishino
چکیده

Machine Translation (MT) for low-resource language has low-coverage issues due to Out-OfVocabulary (OOV) Words. In this research we propose a method using sublexical translation to achieve wide-coverage in Example-Based Machine Translation (EBMT) for English to Bangla language. For sublexical translation we divide the OOV words into sublexical units for getting translation candidates. Previous methods without sublexical translation failed to find translation candidate for many joint words. In this research using WordNet and IPA transliteration algorithm we propose to translate OOV words with explanation. The proposed method is better than previous OOV words handling. Our proposal improved translation quality by 20 points in human evaluation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Sublexical Translations to Handle the OOV Problem in MT

We introduce a method for learning to translate out-of-vocabulary (OOV) words. The method focuses on combining sublexical/constituent translations of an OOV to generate its translation candidates. In our approach, wildcard searches are formulated based on our OOV analysis, aimed at maximizing the probability of retrieving OOVs’ sublexical translations from existing resource of machine translati...

متن کامل

Leveraging translations for speech transcription in low-resource settings

Recently proposed data collection frameworks for endangered language documentation aim not only to collect speech in the language of interest, but also to collect translations into a highresource language that will render the collected resource interpretable. We focus on this scenario and explore whether we can improve transcription quality under these extremely lowresource settings with the as...

متن کامل

Learning Translations for Tagged Words: Extending the Translation Lexicon of an ITG for Low Resource Languages

We tackle the challenge of learning part-ofspeech classified translations as part of an inversion transduction grammar, by learning translations for English words with known part-of-speech tags, both from existing translation lexica and from parallel corpora. When translating from a low resource language into English, we can expect to have rich resources for English, such as treebanks, and smal...

متن کامل

Pivot-based word alignment

Word alignment is the task of, given two sentences that are translations of each other, determining which words correspond to each other across the two sentences. Word alignment is an important step in the pipeline of constructing a statistical machine translation system, but success at word alignment depends heavily on the quantity of training data available. The traditional methods for comput...

متن کامل

An Unsupervised Probability Model for Speech-to-Translation Alignment of Low-Resource Languages

For many low-resource languages, spoken language resources are more likely to be annotated with translations than with transcriptions. Translated speech data is potentially valuable for documenting endangered languages or for training speech translation systems. A first step towards making use of such data would be to automatically align spoken words with their translations. We present a model ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013